Method for non-linear distortion immune end-to-end learning with autoencoder - ofdm

ABSTRACT

A new layer tailored for Artificial Intelligence-based communication systems to limit the instantaneous peak power for the signals that relies on manipulation of complementary sequences through neural networks. Disclosed is a method for providing non-linear distortion in end-to-end learning communication systems, the communication system comprising a transmitter and a receiver. The method includes mapping transmitted information bits to an input of a first neural network; controlling, by an output of the neural network, parameters of a complementary sequence (CS) encoder, producing an encoded CS; transmitting the encoded CS through an orthogonal frequency division multiplexing (OFDM) signal; processing, by Discrete Fourier Transform (DFT), the encoded CS, to produce a received information signal in a frequency domain; and processing, by a second neural network, the received information signal.

CROSS REFERENCE TO RELATED APPLICATION

This Application claims priority to U.S. Provisional Patent No.62/913,776, filed Oct. 11, 2019, titled, Methods for Non-linearDistortion Immune End-to-End Learning with Autoencoder-OFDM.

BACKGROUND OF THE INVENTION Field of the Invention

The present invention relates to a new layer tailored for ArtificialIntelligence-based communication systems to limit the instantaneous peakpower for the signals that relies on manipulation of complementarysequences through neural networks.

Description of Related Art

Traditional end-to-end learning (e.g., auto-encoder orthogonal frequencydivision multiplexing (AE-OFDM)) methods do not provide a permanentsolution for PAPR without a rigorous training/optimization procedure,which may potentially increase the training complexity in practice.Solutions that control instantaneous power fluctuations are needed forartificial intelligence (AI) based transmitter and receivers to decreasethe training complexity. Accordingly, it is an object of the presentinvention to provide a new layer tailored for AI-based communicationsystems to limit the instantaneous peak power for the signals thatrelies on manipulation of complementary sequences through neuralnetworks. The current disclosure also provides how to stabilize the meanpower and peak power by using the algebraic representation of thecomplementary sequences.

SUMMARY OF THE INVENTION

In one aspect of the present disclosure, a method for avoidingnon-linear distortion in end-to-end learning communication systems, isprovided, the communication system comprising a transmitter and areceiver. The method includes mapping transmitted information bits to aninput of a first neural network; controlling, by an output of the neuralnetwork, parameters of a complementary sequence (CS) encoder, producingan encoded CS; transmitting the encoded CS through an orthogonalfrequency division multiplexing (OFDM) signal; processing, by DiscreteFourier Transform (DFT), the encoded CS in a frequency domain, toproduce a received information signal; and processing, by a secondneural network, the received information signal.

In one embodiment of this aspect, the CS encoder comprises an amplitudeencoder, a phase encoder, and a shift encoder. In another embodiment,mapping the transmitted information bits to an input of a first neuralnetwork further comprises manually tuning the shift encoder to adjust aposition of non-zero elements of the CS; tuning the amplitude encoderand the phase encoder using the first neural network to produce tunedparameters; and mapping the information bits to the tuned parameters.

In another embodiment, the encoded CS is processed by multiple layers atthe transmitter, the layers including at least a Golay layer, the methodfurther comprising: controlling only the amplitude encoder and the phaseencoder of the Golay layer; and forming an autoencoder which capturestransmitter, channel, and receiver behaviors. In another embodiment,sets for the amplitude encoder and phase encoder are predetermined andoffset by the first neural network in order to be able to transmit alarge number of information bits.

In another embodiment, the receiver further comprises a decoder, themethod further comprising: subtracting, by the second neural network,the offsets from the received information signal to produce a remaininginformation signal; and decoding, by the decoder, the remaininginformation signal. In another embodiment, the layers further include aclipping layer configured to limit the amplitude of the informationsignal. In another embodiment, the layers further include aPolar-to-Cartesian layer configured to convert the coordinate systemfrom Polar coordinates to a Cartesian coordinate system.

In another aspect of the present disclosure, an end-to-end learningcommunication system for avoiding non-linear distortion is provided. Thesystem includes: a transmitter implemented by processing circuitry, theprocessing circuitry comprising a processor and a memory containinginstructions executable by the processor, the processor of thetransmitter configured to: map transmitted information bits to an inputof a first neural network; control, by an output of the neural network,parameters of a complementary sequence (CS) encoder, producing anencoded CS; and transmit the encoded CS through an orthogonal frequencydivision multiplexing (OFDM) signal. The system also including areceiver implemented by processing circuitry, the processing circuitrycomprising a processor and a memory containing instructions executableby the processor, the processor of the receiver configured to: process,by Discrete Fourier Transform (DFT), the encoded CS in a frequencydomain, to produce a received information signal; and process, by asecond neural network, the received information signal.

In one embodiment of this aspect, the CS encoder comprises an amplitudeencoder, a phase encoder, and a shift encoder. In another embodiment,The system of claim 10, wherein mapping the transmitted information bitsto an input of a first neural network further comprises: manuallytuning, by the processor of the transmitter, the shift encoder to adjusta position of non-zero elements of the CS; tuning, by the processor ofthe transmitter, the amplitude encoder and the phase encoder using thefirst neural network to produce tuned parameters; and mapping, by theprocessor of the transmitter, the information bits to the tunedparameters.

In another embodiment, the encoded CS is processed by multiple layers atthe transmitter, the layers including at least a Golay layer, theprocessor of the transmitter further configured to control only theamplitude encoder and the phase encoder of the Golay layer; and form anautoencoder which captures transmitter, channel, and receiver behaviors.In another embodiment, sets for the amplitude encoder and phase encoderare predetermined and offset by the first neural network in order to beable to transmit a large number of information bits. In anotherembodiment, the processor of the receiver is further configured to:subtract, by the second neural network, the offsets from the receivedinformation signal to produce a remaining information signal; anddecode, by the decoder, the remaining information signal. In anotherembodiment, the layers further include a clipping layer configured tolimit the amplitude of the information signal. In another embodiment,the layers further include a Polar-to-Cartesian layer configured toconvert the coordinate system from Polar coordinates to a Cartesiancoordinate system.

BRIEF DESCRIPTION OF THE DRAWINGS

The construction designed to carry out the invention will hereinafter bedescribed, together with other features thereof. The invention will bemore readily understood from a reading of the following specificationand by reference to the accompanying drawings forming a part thereof,wherein an example of the invention is shown and wherein

FIG. 1 illustrates an exemplary communications system in accordance withembodiments of the present disclosure;

FIG. 2 illustrates an exemplary communications device in accordance withembodiments of the present disclosure;

FIG. 3 shows tuning the CS encoder for OFDM with a DNN and demodulatingwith another DNN;

FIG. 4 shows a transmitter diagram for real-valued output of the shiftencoder;

FIG. 5 shows controlling only amplitude and phase encoders of Golaylayer with a DNN over an OFDM transmission/reception and forming anautoencoder which captures transmitter, channel, and receiver behaviors;

FIG. 6 block error rate, bit error rate, and spectral efficiency of theAI-based learning with Golay layer;

FIG. 7 shows a PAPR comparison;

FIG. 8 shows the distribution of the elements of the sequence ondifferent subcarriers (i.e., learned constellation per subcarrier); and

FIG. 9 shows Table 1—Layer Information at the Transmitter, Channel, andReceiver.

It will be understood by those skilled in the art that one or moreaspects of this invention can meet certain objectives, while one or moreother aspects can meet certain other objectives. Each objective may notapply equally, in all its respects, to every aspect of this invention.As such, the preceding objects can be viewed in the alternative withrespect to any one aspect of this invention. These and other objects andfeatures of the invention will become more fully apparent when thefollowing detailed description is read in conjunction with theaccompanying figures and examples. However, it is to be understood thatboth the foregoing summary of the invention and the following detaileddescription are of a preferred embodiment and not restrictive of theinvention or other alternate embodiments of the invention. Inparticular, while the invention is described herein with reference to anumber of specific embodiments, it will be appreciated that thedescription is illustrative of the invention and is not constructed aslimiting of the invention. Various modifications and applications mayoccur to those who are skilled in the art, without departing from thespirit and the scope of the invention, as described by the appendedclaims Likewise, other objects, features, benefits and advantages of thepresent invention will be apparent from this summary and certainembodiments described below, and will be readily apparent to thoseskilled in the art. Such objects, features, benefits and advantages willbe apparent from the above in conjunction with the accompanyingexamples, data, figures and all reasonable inferences to be drawntherefrom, alone or with consideration of the references incorporatedherein.

DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT

With reference to the drawings, the invention will now be described inmore detail. Unless defined otherwise, all technical and scientificterms used herein have the same meaning as commonly understood to one ofordinary skill in the art to which the presently disclosed subjectmatter belongs. Although any methods, devices, and materials similar orequivalent to those described herein can be used in the practice ortesting of the presently disclosed subject matter, representativemethods, devices, and materials are herein described.

Unless specifically stated, terms and phrases used in this document, andvariations thereof, unless otherwise expressly stated, should beconstrued as open ended as opposed to limiting. Likewise, a group ofitems linked with the conjunction “and” should not be read as requiringthat each and every one of those items be present in the grouping, butrather should be read as “and/or” unless expressly stated otherwise.Similarly, a group of items linked with the conjunction “or” should notbe read as requiring mutual exclusivity among that group, but rathershould also be read as “and/or” unless expressly stated otherwise.

Furthermore, although items, elements or components of the disclosuremay be described or claimed in the singular, the plural is contemplatedto be within the scope thereof unless limitation to the singular isexplicitly stated. The presence of broadening words and phrases such as“one or more,” “at least,” “but not limited to” or other like phrases insome instances shall not be read to mean that the narrower case isintended or required in instances where such broadening phrases may beabsent.

In view of the apparatuses and methods further disclosed herein,exemplary embodiments may be implemented in the context of acommunications system 10 as shown in FIG. 1. The communications system10 may be a complex system of intermediate devices that supportcommunications between communications device 100 and communicationsdevice 200, or the communications device 100 and communications device200 may have a direct link 150, as shown in FIG. 1. In either case, thecommunications devices 100 and 200 may be configured to support wirelesscommunications. In the context of this disclosure, communications device100 may be a transmitter and communications device 200 may be areceiver. Thus, below, reference designator “100” will be usedinterchangeably to identify a communication device and sometimes be usedto identify the transmitter while reference designator “200” will beused to identify a communication device and sometimes be used toidentify the receiver.

In this regard, the system 10 may include any number of communicationsdevices, including communications devices 100 and 200. Although notshown, the communications devices may be physically coupled to astationary unit (e.g., a base station or the like) or a mobile unit(e.g., a mobile terminal such as a cellular phone, a vehicle such as anaerial vehicle, a smart device with IoT capabilities, or the like).

The communications device 100 may comprise, among other components,processing circuitry 101, a radio 110, and an antenna 115. As furtherdescribed below, the processing circuitry 101 may be configured tocontrol the radio 110 to transmit and receive wireless communicationsvia the antenna 115. In the regard, a wireless communications link 150may be established between the antenna 115 and the antenna 215 of thecommunications device 200. Similarly, the communications device 200 maycomprise, among other components, processing circuitry 201, radio 210,and the antenna 215. The processing circuitry 201 may be configured thesame or similar to the processor 101, and thus maybe configured tocontrol the radio 210 to transmit and receive wireless communicationsvia the antenna 215.

In this regard, FIG. 2 shows a more detailed version of thecommunications device 100, and in particular the processing circuitry101. Communication device 100 may also be considered part thetransmitter of the present disclosure. Again, shown in FIG. 2, thecommunications device 100 may comprise the processing circuitry 101, theradio 110, and the antenna 115. However, the link 150 is shown as beinga communications link to communications device 200, or as acommunications link to the network 120, which may be any type of wiredor wireless communications network.

The processing circuity 101 may be configured to receive inputs andprovide outputs in association with the various functionalities of thecommunications device 100. In this regard, the processing circuitry 101may comprise, for example, a memory 102, a processor 103, a userinterface 104, and a communications interface 105. The processingcircuitry 101 may be operably coupled to other components of thecommunications device 100 or other components of a device that comprisesthe communications device 100.

Further, according to some example embodiments, processing circuitry 101may be in operative communication with or embody, the memory 102, theprocessor 103, the user interface 104, and the communications interface105. Through configuration and operation of the memory 102, theprocessor 103, the user interface 104, and the communications interface105, the processing circuitry 101 may be configurable to perform variousoperations as described herein. In this regard, the processing circuitry101 may be configured to perform computational processing, memorymanagement, user interface control and monitoring, and manage remotecommunications, according to an example embodiment. In other words, theprocessing circuitry 101 may comprise one or more physical packages(e.g., chips) including materials, components or wires on a structuralassembly (e.g., a baseboard). The processing circuitry 101 may beconfigured to receive inputs (e.g., via peripheral components), performactions based on the inputs, and generate outputs (e.g., for provisionto peripheral components). In an example embodiment, the processingcircuitry 101 may include one or more instances of a processor 103,associated circuitry, and memory 102. As such, the processing circuitry101 may be embodied as a circuit chip (e.g., an integrated circuit chip,such as a field programmable gate array (FPGA)) configured (e.g., withhardware, software or a combination of hardware and software) to performoperations described herein.

In an example embodiment, the memory 102 may include one or morenon-transitory memory devices such as, for example, volatile ornon-volatile memory that may be either fixed or removable. The memory102 may be configured to store information, data, applications,instructions or the like. The memory 102 may operate to bufferinstructions and data during operation of the processing circuitry 101to support higher-level functionalities, and may also be configured tostore instructions for execution by the processing circuitry 101. Thememory 102 may also store image data, equipment data, crew data, and avirtual layout as described herein. According to some exampleembodiments, such data may be generated based on other data and storedor the data may be retrieved via the communications interface 105 andstored.

As mentioned above, the processing circuitry 101 may be embodied in anumber of different ways. For example, the processing circuitry 101 maybe embodied as various processing means such as one or more processors103 that may be in the form of a microprocessor or other processingelement, a coprocessor, a controller or various other computing orprocessing devices including integrated circuits such as, for example,an ASIC (application specific integrated circuit), an FPGA, or the like.In an example embodiment, the processing circuitry 101 may be configuredto execute instructions stored in the memory 102 or otherwise accessibleto the processing circuitry 101. As such, whether configured by hardwareor by a combination of hardware and software, the processing circuitry101 may represent an entity (e.g., physically embodied in circuitry—inthe form of processing circuitry 101) capable of performing operationsaccording to example embodiments while configured accordingly. Thus, forexample, when the processing circuitry 101 is embodied as an ASIC, FPGA,or the like, the processing circuitry 101 may be specifically configuredhardware for conducting the operations described herein. Alternatively,as another example, when the processing circuitry 101 is embodied as anexecutor of software instructions, the instructions may specificallyconfigure the processing circuitry 101 to perform the operationsdescribed herein.

The communication interface 105 may include one or more interfacemechanisms for enabling communication by controlling the radio 110 togenerate the communications link 150. In some cases, the communicationinterface 105 may be any means such as a device or circuitry embodied ineither hardware, or a combination of hardware and software that isconfigured to receive or transmit data from/to devices in communicationwith the processing circuitry 101. The communications interface 105 maysupport wireless communications via the radio 110 using variouscommunications protocols (802.11WIFI, Bluetooth, cellular, WLAN, 3GPP NRor the like).

The user interface 104 may be controlled by the processing circuitry 101to interact with peripheral devices that can receive inputs from a useror provide outputs to a user. In this regard, via the user interface104, the processing circuitry 101 may be configured to provide controland output signals to a peripheral device such as, for example, akeyboard, a display (e.g., a touch screen display), mouse, microphone,speaker, or the like. The user interface 104 may also produce outputs,for example, as visual outputs on a display, audio outputs via aspeaker, or the like.

The radio 110 may be any type of physical radio comprising radiocomponents. For example, the radio 110 may include components such as apower amplifier, mixer, local oscillator, modulator/demodulator, and thelike. The components of the radio 110 may be configured to operate in aplurality of spectral bands. Further, the radio 110 may be configured toreceive signals from the processing circuitry 101 for transmission tothe antenna 115. In some example embodiments, the radio 110 may be asoftware defined radio.

The antenna 115 may be any type of wireless communications antenna. Theantenna 115 may be a configured to transmit and receive at more than onefrequency or band. In this regard, according to some exampleembodiments, the antenna 115 may be an array of antennas that may beconfigured by the radio 115 to support various types of wirelesscommunications as described herein.

AI-based communication systems utilize machine learning modules (e.g.,deep learning) to replace the functionality of the highly engineeredblocks (e.g., coding, modulation, waveform etc.) in the physical layerof communication systems. However, the machine learning methods in thefield of computer vision may not be directly applied to thecommunication systems as communication system may introduce differentchallenges. One of the challenges related to communication systems isthe high instantaneous power fluctuations. Although there are somemethods limit the instantaneous peak power in the literature, thesemethods may require further training to overcome the non-linearities. Inthis invention, we solve the instantaneous peak problem of AI-basedcommunication system through by introducing a new layer, i.e., Golaylayer.

In this document, we disclose a new layer (Golay layer) tailored forAI-based communication systems to limit the instantaneous peak power forthe signals. The disclosed methods can also be applied to communicationdevices that operate under power-limited link budgets while autonomouslydecreasing the error rate for auto-encoder orthogonal frequency divisionmultiplexing. The invention may be part of a wireless standard thatallows AI-based communication systems. The disclosed method may alsodecrease the training complexity. The embodiment relies on themanipulation of the complementary sequences through neural networks. Wedisclose how to stabilize the mean power and peak power by using thealgebraic representation of the complementary sequences. The introducedlayer may be used with several other basic layers, such as the clippinglayer and Polar-to-Cartesian layer 510 as the Golay layer 505 operatesin the polar coordinate. By also changing the parameter of the Golaylayer 505, it can also allow constant-amplitude sequence in thefrequency domain.

Motivation and Problem Statement

Traditional end-to-end learning (e.g., auto-encoder orthogonal frequencydivision multiplexing (AE-OFDM)) methods do not provide a permanentsolution for PAPR without a rigorous training/optimization procedure,which may potentially increase the training complexity in practice.Solutions that control instantaneous power fluctuations are needed forartificial intelligence (AI) based transmitter and receivers to decreasethe training complexity.

Sequences and Waveforms

The polynomial representation of the sequence a=(a₀, a₁, . . . ,a_(N-1)) is given by

p _(a)(z)=a _(N-1) z ^(N−1) +a _(N-2) z ^(N−2) + . . . +a ₀  (1)

Based on the polynomial representation, the following interpretationscan be made:

-   -   If

${z \in \left\{ e^{\frac{j\; 2\pi \; t}{T}} \middle| {0 \leq t < T} \right\}},$

p_(a)(z) is equivalent to OFDM signal in time where T is the OFDM symbolduration and the frequency domain coefficients are the elements of awhere a₀ is mapped to the DC tone.

-   -   If

${z \in \left\{ e^{\frac{j\; 2\pi \; t}{T}} \middle| {0 \leq t < T} \right\}},$

the instantaneous power of an OFDM symbol can be calculated as|p_(a)(z)|²=p_(a)(z)p_(a*)(z⁻¹) as p_(a*)(z⁻¹)=(p_(a)(z))*. Thus, thepeak-to-average-power ratio (PAPR) of p_(a)(z) can be obtained by using|p_(a)(z)|² within a period of

$z = e^{\frac{j\; 2\pi \; t}{T}}$

where t=[0, T).

Representation of a Sequence

Let f be a function that maps from

₂ ^(m)={(x₁, x₂, . . . , x_(m))|x_(i)∈{0,1}} to

as

f(x ₁ , x ₂ , . . . , x _(m)):

₂ ^(m)→

.  (2)

We associate a sequence f of length 2^(m) with the function f(x₁, x₂, .. . , x_(m)) by listing its values as (x₁, x₂, . . . , x_(m)) rangesover its 2^(m) values in lexicographic order. In other words, the(x+1)th element of the sequence f is equal to f(x₁, x₂, . . . , x_(m))where x=Σ_(j=1) ^(m)x_(j)2^(m−j) (i.e., the most significant bit is x₁).The sequence x and f(x) denote (x₁, x₂, . . . , x_(m)) and f(x₁, x₂, . .. , x_(m)), respectively. Note that if

=

₂, f(x) is a Boolean function. If

=

_(H), f(x) is called a generalized Boolean function.

Algebraic Normal Form (ANF)

A generalized Boolean function can be uniquely expressed as a linearcombination over

_(H) of the monomials as

$\begin{matrix}{{{f\left( {x_{1},x_{2},\ldots \mspace{14mu},x_{m}} \right)} = {{f(x)} = {{\sum\limits_{k = 0}^{2^{m} - 1}{c_{k}\underset{\underset{{ith}\mspace{14mu} {monomial}}{}}{\prod\limits_{j = 1}^{m}x_{j}^{k_{j}}}}} = {{c_{0}1} + {c_{1}\left( x_{1} \right)}_{2} + {{c_{2}\left( x_{2} \right)}x_{2}} + {\ldots \mspace{14mu} {c_{m\text{?}1}\left( {x_{1}x_{2}} \right)}_{2}} + \ldots}}}}{\text{?}\text{indicates text missing or illegible when filed}}} & (3)\end{matrix}$

where the coefficient of each monomial belongs to

_(H) i.e., c_(k)∈

_(H) and k=Σ_(j=1) ^(m)k_(j)2^(m−j) and x_(j)∈

₂. Note that monomials, e.g., 1, x₁, x₂, x₁x₂, . . . , and x₁x₂ . . .x_(m) are linearly independent. Linear independence can be proven byusing the definition of linear independence, i.e., Σ_(i)a_(i)x^(i)=0 ifand only if a_(i)=0 for all x.

For example, let m=3 and H=4. Then

$\begin{matrix}{{f\left( {x_{1},x_{2},x_{3}} \right)} = {{c_{0}x_{1}^{0}x_{2}^{0}x_{3}^{0}} + {c_{1}x_{1}^{0}x_{2}^{0}x_{3}^{1}} + {c_{2}x_{1}^{0}x_{2}^{1}x_{3}^{0}} + {c_{3}x_{1}^{0}x_{2}^{1}x_{3}^{1}} + {c_{4}x_{1}^{1}x_{2}^{0}x_{3}^{0}} + {c_{5}x_{1}^{1}x_{2}^{0}x_{3}^{1}} + {c_{6}x_{1}^{1}x_{2}^{1}x_{3}^{0}} + {c_{7}x_{1}^{1}x_{2}^{1}{x_{3}^{1}.}}}} & (4)\end{matrix}$

Assume that c_0=3 and c_5=2 and other c_n=0 for n=1, 2, 3, 4, 6, 7.Then,

f(x ₁ , x ₂ , x ₃)=3+2x ₁ x ₃  (5)

As described herein, we associate a sequence f of length 2^(m) with thefunction f(x₁, x₂, . . . , x_(m)) by listing its values as (x₁, x₂, . .. , x_(m)) ranges over its 2^(m) values in lexicographic order. In otherwords, the (x+1)th element of the sequence f is equal to f(x₁, x₂, . . ., x_(m)) where x=Σ_(j=1) ^(m)x_(j)2^(m−j) (i.e., the most significantbit is x₁)

f(x ₁=0,x ₂=0,x ₃=0)=3+2x ₁ x ₃=3 mod 4=3

f(x ₁=0,x ₂=0,x ₃=1)=3+2x ₁ x ₃=3 mod 4=3

f(x ₁=0,x ₂=1,x ₃=0)=3+2x ₁ x ₃=3 mod 4=3

f(x ₁=0,x ₂=1,x ₃=1)=3+2x ₁ x ₃=3 mod 4=3

f(x ₁=1,x ₂=0,x ₃=0)=3+2x ₁ x ₃=3 mod 4=3

f(x ₁=1,x ₂=0,x ₃=1)=3+2x ₁ x ₃=5 mod 4=1

f(x ₁=1,x ₂=1,x ₃=0)=3+2x ₁ x ₃=3 mod 4=3

f(x ₁=1,x ₂=1,x ₃=1)=3+2x ₁ x ₃=5 mod 4=1

Therefore, f(x₁, x₂, x₃)=3+2x₁x₃ leads to a sequence off=(3,3,3,3,3,1,3,1).

All the possible monomials construct a basis for the generalized Booleanfunctions. Since there are 2^(m) monomials for a given m, there are H²^(m) different generalized Boolean functions, each of which is a mapping

₂ ^(m)→

_(H).

If f(x₁, x₂, . . . , x_(m)) is over

, the coefficient of each monomial belongs to

, i.e., c_(k)∈

and the monomials construct a vector space over

and the dimensionality of the space is 2^(m). Therefore, different setsof {c_(k)|k=0, . . . , 2^(m−1)} lead to different sequences.

Aperiodic Auto Correlation (APAC) of a Sequence

Let ρ_(a)(k) be the aperiodic autocorrelation of a complex sequence a oflength N and ρ_(a)(k) is expressed as

$\begin{matrix}{\mspace{20mu} {{\rho_{a}(k)}\overset{\Delta}{=}\left\{ {\begin{matrix}{{\rho_{a}^{+}(k)},} & {k > 0} \\{{\rho_{a}^{+}\left( {- k} \right)}^{\text{?}},} & {k < 0}\end{matrix}\mspace{20mu} {Where}} \right.}} & (6) \\{\mspace{20mu} {{\rho_{a}^{+}(k)}\overset{\Delta}{=}\left\{ {{\begin{matrix}{{\sum\limits_{i = 0}^{N - k - 1}{a_{i}^{+}a_{\text{?}\text{?}k}}},} & {0 \leq k \leq {N - 1}} \\{0,} & {otherwise}\end{matrix}.\text{?}}\text{indicates text missing or illegible when filed}} \right.}} & (7)\end{matrix}$

Complementary Sequences

The pair of (a, b) is called a Golay complementary pair (GCP) if

ρ_(a)(k)+ρ_(b)(k)=0, k≠0.  (8)

The sequence a=(a₀, a₁, . . . , a_(N-1)) is defined as a complementarysequence (CS) if there exists another sequence b=(b₀, b₁, . . . ,b_(N-1)) which complements a as ρ_(a)(k)+ρ_(b)(k)=0, k≠0.

It has been shown that the PAPR of a CS can be less than 3 dB.

Complementary Sequence Encoder

The following theorem for constructing complementary sequences isprovided:

Theorem: Let π denote any permutation {1, 2, . . . , m} and (a, b) be aGolay complementary pair (GCP) of length N and calculate

$\begin{matrix}{\mspace{20mu} {{f_{o}\left( {x,z} \right)} = \left( {{{p_{\text{?}}(z)}\left( {1 - x_{\text{?}{(1)}}} \right)} + {{p_{b}(z)}x_{n{(\text{?})}}}} \right)}} & (9) \\{\mspace{20mu} {{f_{r}(x)} = {e_{0} + {e_{m}x_{\pi {(m)}}} + {\sum\limits_{l = 1}^{m - 1}{e_{l}\left( {x_{\pi {(l)}} + x_{u{({l\text{?}1})}}} \right)}}}}} & (10) \\{\mspace{20mu} {{f_{i}(x)} = {k_{D} + {\sum\limits_{i = 1}^{m}{k_{\text{?}}x_{\text{?}{(\text{?})}}}} + {\frac{H}{2}{\sum\limits_{l = 1}^{m - 1}{x_{n{(\text{?})}}x_{n{({\text{?}\text{?}1})}}}}}}}} & (11) \\{\mspace{20mu} {{{f_{\text{?}}(x)} = {\sum\limits_{n = 1}^{m}{d_{n}x_{\text{?}{(n)}}}}}{\text{?}\text{indicates text missing or illegible when filed}}}} & (12)\end{matrix}$

where x=(x₁, x₂, . . . x_(m)) and x=Σ_(j) ² ^(m) x_(j)2^(m−j) for x_(j)∈

₂, e_(n)∈

, k_(n)∈[0, H), d_(n)∈

for n=0, 1, . . . , m. Then, the sequence c where its polynomialrepresentation is given by

$\begin{matrix}{\mspace{20mu} {{{p_{\text{?}}(z)} = {\sum\limits_{x = 0}^{2^{m} - 1}{{f_{0}\left( {x,z} \right)} \times e^{\frac{2\pi}{H}{({{f_{r}{(x)}}\text{?}\text{?}{f_{i}{(x)}}})}} \times z^{{f_{\text{?}}{(x)}} + {xN}}}}}{\text{?}\text{indicates text missing or illegible when filed}}}} & (13)\end{matrix}$

is a complementary sequence (CS).

The polynomial p_(c)(z) forms an OFDM symbol for

$z = e^{\frac{2\pi \; t}{T}}$

and limits the peak-to-average-power ratio to be less than or equal to 2(i.e., approximately 3 dB) as the sequence c is a CS. On the other hand,there is no teaching how to choose the parameters, i.e., e_(n), k_(n),d_(n), based on information bits and the demonstrations are provided forrandom bit mappings.

End-to-End Learning with Deep Learning Autoencoder

Autoencoders are a specific type of neural networks, where the inputsare mapped to themselves, i.e., the network is trained to approximatethe identity operation. Typically, an autoencoder contains a centrallayer containing fewer nodes than inputs. It can be considered as twohalf-networks: the encoder and decoder map either to or from a reducedset of feature variables embodied in central layer nodes. Autoencodersprovides an intuitive approach to non-linear dimensionality reduction,i.e., non-linear principal component analysis (PCA).

For the communication point-of-view, transmitter-channel-receiver canalso be thought as an autoencoder as “the fundamental problem ofcommunication is that of reproducing at one point either exactly orapproximately a message selected at another point.” It has been shownthat the transmit map the information bits to a higher dimensional spacethrough a neural network (e.g., deep neural network (DNN, convolutionalneural network (CNN)) and the receiver 200 decodes the sequence in thehigher-dimension space by also using another network. The coefficientsof the networks can be obtained through an offline or online trainingprocedure. In literature, it has been shown that autoencoders can alsobe combined with OFDM-based waveforms, called AE-OFDM.

Backpropagation Algorithm

Backpropagation (BP) is a method to calculate the gradients of thelearnable parameters in a network with multiple layers for a given setof inputs and a loss function. After the loss is calculated for a givenset of inputs, the gradients for the learnable parameters in each layer,typically in a stochastic way, are calculated starting from the lastlayer and propagates to the input. After the gradients are calculated,the learnable parameters are updated based on various algorithmicmethods, e.g., gradient descent.

Prior-Art Deep Learning Layers

Let

={x^((i))|n=1:N} be a minibatch. The output of a layer can be expressedas y^((i))=f(x^((i)); θ), where θ corresponds to the parameter vector.The loss function can be expressed as

$\begin{matrix}{{J(\theta)} = {\frac{1}{N}{\sum\limits_{i = 1}^{N}{L\left( {y^{(i)},x^{(i)},\theta} \right)}}}} & (14)\end{matrix}$

where L(y^((i)),x^((i)),θ) is the per-example loss function. Because ofthe additive loss functions, the gradient ∇_(a)J(θ) can be calculated as

$\begin{matrix}{{\nabla_{\theta}{J(\theta)}} = {\frac{1}{N}{\sum\limits_{i = 1}^{N}{{\nabla_{0}{L\left( {y^{(i)},x^{(i)},\theta} \right)}}.}}}} & (15)\end{matrix}$

Fully-Connected Layer

The fully-connected layer can be expressed as

y=f(x; W, b)=Wx+b  (16)

where W∈

^(X×M) be a matrix containing the weights of a linear transformation andb∈

^(M×1) is a bias vector. Let X=[x⁽¹⁾ x⁽²⁾ . . . x^((N))]∈

^(M×N) be a matrix for the minibatch and Y=[y⁽¹⁾ y⁽²⁾ . . . y^((N))]∈

^(X×N) be the output of the layer. The derivative of the loss withrespect to the weight W_(ij) can be calculated as

$\begin{matrix}{{\frac{\partial J}{\partial W_{{ij}\;}} = {{\overset{N}{\sum\limits_{n = 1}}{\sum\limits_{k = 1}^{K}{\frac{\delta \; y_{k}^{(n)}}{\delta \; W_{{ij}\;}}\frac{\delta \; J}{\delta y_{k}^{(n)}}}}} = {{\sum\limits_{n = 1}^{N}{\frac{\delta \; y_{i}^{(n)}}{\delta \; W_{ij}}\frac{\delta J}{\delta \; y_{\text{?}}^{(n)}}}} = {\sum\limits_{n = 1}^{N}{x_{\text{?}}^{(n)}\frac{\delta J}{\delta \; y_{\text{?}}^{(n)}}}}}}}\mspace{20mu} {{Therefore},}} & (17) \\{\mspace{20mu} {{\frac{\partial J}{\partial W} = {\frac{\partial J}{\partial Y}X^{T}}}{\text{?}\text{indicates text missing or illegible when filed}}}} & (18)\end{matrix}$

The derivative of the loss with respect to b_(i) can be calculated as

$\begin{matrix}{\mspace{20mu} {{\frac{\partial J}{\partial b_{\text{?}}} = {{\overset{N}{\sum\limits_{n = 1}}{\sum\limits_{k = 1}^{K}{\frac{\delta \; y_{k}^{(n)}}{\delta \; b_{i}}\frac{\delta \; J}{\delta y_{k}^{(n)}}}}} = {\sum\limits_{n = 1}^{N}\frac{\delta J}{\delta \; y_{\text{?}}^{(n)}}}}}\mspace{20mu} {{Therefore},}}} & (19) \\{\mspace{20mu} {{\frac{\partial J}{\partial b} = {\frac{\partial J}{\partial Y}1_{N \times 1}}}{\text{?}\text{indicates text missing or illegible when filed}}}} & (20)\end{matrix}$

The derivative of the loss with respect to x_(i) can be calculated as

$\begin{matrix}{\mspace{20mu} {{\frac{\partial J}{\partial x_{k}^{(n)}} = {{\sum\limits_{i = 1}^{K}{\frac{\delta \; y_{t}^{(n)}}{\delta \; x_{k}^{(n)}}\frac{\delta J}{\delta \; y_{\text{?}}^{(n)}}}} = {\sum\limits_{i = 1}^{K}{W_{\text{?}}\frac{\delta J}{\delta \; y_{\text{?}}^{(n)}}}}}}\mspace{20mu} {{Therefore},}}} & (21) \\{\mspace{20mu} {{\frac{\partial J}{\partial X} = {W^{T}\frac{\partial J}{\partial Y}}}{\text{?}\text{indicates text missing or illegible when filed}}}} & (22)\end{matrix}$

Batchnorm Layer

Let X=[x⁽¹⁾ x⁽²⁾ . . . x^((N))]∈

^(M×N) and Y=[y⁽¹⁾ y⁽²⁾ . . . y^((N))]∈

^(M×N) be the input and output of the layer, respectively. For ith rowof X, the batchnorm layer can be defined as

$\begin{matrix}{\mspace{20mu} {\mu_{\text{?}} = {\frac{1}{N}{\sum\limits_{n = 1}^{N}x_{i}^{(n)}}}}} & (23) \\{\mspace{20mu} {\sigma_{i}^{2} = {\sum\limits_{n = 1}^{N}\left( {x_{\text{?}}^{(n)} - \mu_{\text{?}}} \right)}}} & (24) \\{\mspace{20mu} {{\hat{x}}_{i}^{(n)} = {\left( {x_{\text{?}}^{(n)} - \mu_{\text{?}}} \right)\left( {\sigma^{2} + \epsilon} \right)^{- \frac{1}{2}}}}} & (25) \\{\mspace{20mu} {{y_{i}^{(n)} = \; {{y_{\text{?}}{\hat{x}}_{\text{?}}^{(n)}} + \beta_{\text{?}}}}{\text{?}\text{indicates text missing or illegible when filed}}}} & (26)\end{matrix}$

The derivative of the loss with respect to the weight β_(i) can becalculated as

$\begin{matrix}{\mspace{20mu} {{\frac{\partial J}{\partial\beta_{i}} = {{\sum\limits_{n = 1}^{N}{\sum\limits_{k = 1}^{K}{\frac{\delta \; y_{k}^{(n)}}{\delta \; \beta_{i}}\frac{\delta J}{\delta \; y_{\text{?}}^{(n)}}}}} = {\sum\limits_{n = 1}^{K}\frac{\delta J}{\delta \; y_{\text{?}}^{(n)}}}}}\mspace{20mu} {{Therefore},}}} & (27) \\{\mspace{20mu} {{\frac{\partial J}{\partial\beta} = {\frac{\partial J}{\partial Y}1_{N \times 1}}}{\text{?}\text{indicates text missing or illegible when filed}}}} & (28)\end{matrix}$

The derivative of the loss with respect to the weight γ_(i) can becalculated as

$\begin{matrix}{\mspace{20mu} {{\frac{\partial J}{\partial\gamma_{i}} = {{\sum\limits_{n = 1}^{N}{\sum\limits_{k = 1}^{K}{\frac{\delta \; y_{k}^{(n)}}{\delta \; \gamma_{i}}\frac{\partial J}{\partial y_{k\;}^{(n)}}}}} = {\sum\limits_{n = 1}^{N}{{\hat{x}}_{i}^{(n)}\frac{\delta J}{\delta \; y_{\text{?}}^{(n)}}}}}}\mspace{20mu} {{Therefore},}}} & (29) \\{\mspace{20mu} {{\frac{\partial J}{\partial Y} = {\left( {\frac{\partial J}{\partial Y} \odot \hat{X}} \right)1_{N \times 1}}}{\text{?}\text{indicates text missing or illegible when filed}}}} & (30)\end{matrix}$

ReLU

The ReLU layer can be expressed as

y=f(x)=max{x, 0}  (31)

The derivative of the loss with respect to x can be calculated as

$\begin{matrix}{\frac{\partial J}{\partial x} = \left\{ \begin{matrix}\frac{\partial J}{\partial y} & {x > 0} \\0 & {x < 0}\end{matrix} \right.} & (32)\end{matrix}$

Non-Linear Distortion Immune End-to-End Learning for OFDM

In one embodiment, the transmit bits may be mapped to the input of afirst neural network (e.g., DNN, CNN) 305 (shown in FIG. 3) and theoutput of the neural network 305 may control the parameters of a CSencoder 300 (e.g., amplitude encoder 320, phase encoder 322, and shiftencoder 324), and the encoded CS at the output of the CS encoder 300 maybe transmitted through an OFDM symbol. FIG. 3 shows tuning the CSencoder 300 for OFDM with a first Deep Neural Network (DNN) 305, anddemodulating with a second DNN 310. For example, as shown in FIG. 3, Mtransmit bits, e.g., information bits, may be processed by first DNN305. The DNN 305 may calculate e_(n), k_(n), d_(n)∈

for n=0, 1, . . . , m. The calculated parameters may be processed by theamplitude, phase, and shift encoders as

$\begin{matrix}{\mspace{20mu} {{f_{\text{?}}(x)} = {e_{0} + {e_{m}x_{n{(m)}}} + {\sum\limits_{l = 1}^{m - 1}{e_{l}\left( {x_{n{(l)}} + x_{n{({\text{?} \cdot 1})}}} \right)}}}}} & (33) \\{\mspace{20mu} {{f_{i}(x)} = {k_{0} + {\sum\limits_{l = 1}^{m}{k_{\text{?}}x_{n{(l)}}}}}}} & (34) \\{\mspace{20mu} {{{f_{s}(x)} = {d_{0} + {\sum\limits_{n = 1}^{m}{d_{n}x_{\text{?}{(n)}}}}}}{\text{?}\text{indicates text missing or illegible when filed}}}} & (35)\end{matrix}$

where x=(x₁, x₂, . . . x_(m)) and x=Σ_(j) ² ^(m) x_(j)2^(m−j) for x_(j)∈

₂, π denotes any permutation {1, 2, . . . , m}, and (a, b) be a GCP oflength N. Then, the OFDM waveform can be expressed as

$\begin{matrix}{{{p_{\text{?}}(t)} = {\sum\limits_{x = 0}^{2^{m} - 1}{{f_{n}(x)} \times e^{j\; \pi \; {f_{sign}{(x)}}} \times e^{{\alpha \; {f_{r}{(x)}}} + {j\; \beta \; {f_{\text{?}}{(x)}}}} \times e^{\frac{j\; 2\pi \; t}{T}{({{f_{s}{(x)}} + {xN}})}}}}}{\text{?}\text{indicates text missing or illegible when filed}}} & (36)\end{matrix}$

where

${f_{o}(x)} = \left( {{{p_{a}\left( e^{\frac{j\; 2\pi \; t}{T}} \right)}\left( {1 - x_{\pi {(1)}}} \right)} + {{p_{b}\left( e^{\frac{j\; 2\pi \; t}{T}} \right)}x_{\pi {(1)}}}} \right)$

and f_(sign)(x)=Σ_(l=1) ^(m−1)x_(π(l))x_(π(l+1)), α and β are non-zerovalues, may be transmitted through the radio chain. The parameters α andβ scale the output of encoders. For example, the impacts of amplitudeencoder 320 and the phase encoder 322 on the resulting waveform vanishfor α=0 and β=0, respectively. These parameters may be controlledthrough a communication network or prescribed (in a wireless standard).In one specific embodiment, these parameters may also be learned throughneural networks.

The waveform may be implemented through an IDFT operation where theencoded CS, i.e., the sequence c where its elements are the coefficients

$e^{\frac{j2\pi t}{T}k}$

for k∈

may be the sequence in the frequency domain. The shift operation shownin FIG. 3 may pad zeros to the beginning of the input sequence on eachbranch. The sequence on each branch may be generated through theordering block, which yields either sequence a or the sequence b. Thesum operation in FIG. 3 may append zeros to each of the sequences toapply point-to-point summation by aligning the sequence length. Thesummed sequence may yield to the encoded sequence c.

At the receiver side, a DFT-based receiver 200 (e.g., OFDM receiver) maybe used. In one method, the received signal is processed by DFT 315 andthe resulting signal in the frequency domain may be processed by secondneural network DNN 310 to receive the transmitted bits.

The overall impact of the transmitter 100, channel, and receiver 200 maybe repressed as an autoencoder and the learnable parameters in eachlayer at the transmitter 100 and receiver 200 may be learned by using aBP algorithm. The learning may be achieved through an offline or anonline learning method. Training may be performed for AWGN orRayleigh/Rician-like channels for offline training.

In some cases, it may be important to limit the mean power beside thePAPR. To normalize the signal power, the e₀ parameter of the amplitudeencoder 320 of the CS encoder 300 may be chosen as a function of e_(n)for n=1, 2, . . . , m as

$\begin{matrix}{e_{0} = {{- \frac{1}{2\alpha}}{\sum\limits_{n = 1}^{m}{{\ln\left( \frac{1 + e^{2\alpha e_{n}}}{2} \right)}.}}}} & (37)\end{matrix}$

Hence, the neural network may provide m values for e_(n) for n=1, 2, . .. , m. Note that the variable e_(n) scales the half of the coefficientsby e^(ae) ^(n) . Therefore, the encoded CS power is scaled up by

$\frac{\left( {1 + e^{2ae_{n}}} \right)}{2}.$

For all e_(n) for n=1, 2, . . . , m, the total power factor can becalculated as

$\gamma = {\prod\limits_{n = 1}^{m}{\frac{1 + e^{2\alpha e_{n}}}{2}.}}$

Therefore, to normalize the sequence power, e′ should be selected suchthat

$e^{2\; \alpha \; e_{0}} = {\frac{1}{\gamma}.}$

As a result,

$e_{0} = {{\frac{1}{2\alpha}\ln \frac{1}{\gamma}} = {\frac{1}{2\alpha}{\prod\limits_{n = 1}^{m}{{\ln\left( \frac{1 + e^{2\alpha e_{n}}}{2} \right)}.}}}}$

In case of real-valued output for the shift encoder (e.g., non-integervalues for D₀, D₁, . . . D₂ _(m) ₋₁ where D_(x) denotes f_(shift)(x) forx=(x₁, x₂, . . . x_(m)) and x=Σ_(j) ²x_(j)2^(m−j) for x_(j)∈

₂), the transmitter 100 may generate the waveform by sampling thesummation of the polynomial output, as illustrated as in FIG. 4, wherethe polynomials on each branch may be sampled with the rate of f_(s).FIG. 4 shows a transmitter diagram for transmitter 100 for real-valuedoutput of the shift encoder 324.

Learning Amplitude and Phase Bit Mappings and Golay Layer

In one method, to reduce the training complexity, the shift encoder 324may be controlled manually (e.g., either by a communication network orit is prescribed) to adjust the position of the non-zero elements of theencoded sequence, while the amplitude encoder 320 and phase encoder 322may be tuned with a neural network 305 and the information bits mappedto the tuned parameters.

As exemplified in FIG. 5, the information bits may be processed bymultiple layers (e.g., N_(k) layers at the transmitter 100). Theselayers may be a combination of some prior-art layers such as softmax,ReLU, batchnorm, convolution layer, fully-connected layer, etc. Then,the output of these layers, e.g., vectors, may pass through a set ofclipping layers 500 to avoid large numbers. The description of theclipping layer 500 is provided herein and generates m input for theamplitude encoder and m+1 input for the phase encoder of the normalizedGolay encoding layer 505 described herein. The transmitted waveform incontinuous-time may be calculated as

p  ( t ) = ∑ x = 0 2 m - 1  f ?  ( x ) × e j   π   f sign  ( x )× e α   f r  ( x ) + ?  β   f ?  ( x ) × e j   2  π   t T ( f ?  ( x ) + xN )   ?  indicates text missing or illegible whenfiled ( 38 )

where

${f_{o}(x)} = \left( {{{p_{a}\left( e^{\frac{j\; 2\pi \; t}{T}} \right)}\left( {1 - x_{\pi {(1)}}} \right)} + {{p_{b}\left( e^{\frac{j\; 2\pi \; t}{T}} \right)}x_{\pi {(1)}}}} \right)$

and f_(sign)(x)=Σ_(i=1) ^(m−1)x_(π(l))x_(π(l+1)) and f_(s)(x)=d₀+Σ_(n=1)^(m)d_(n)x_(π(n)) are fixed. The shift encoder f_(s)(x) may beconfigured based on the resource allocation indicated by the network. Acyclic prefix may also be prepended to the transmitted signal. Thetransmitter 100 may use a DFT operation to implement p_(c)(t), as in anOFDM scheme. FIG. 5 shows controlling only amplitude and phase encodersof Golay layer 505 with a DNN over an OFDM transmission/reception andforming an autoencoder which captures transmitter 100, channel, andreceiver 200 behaviors.

At the receiver side, an OFDM receiver 200 may be used. In one method,the received signal is processed by DFT 315 and the resulting signal inthe frequency domain first processed by a matched filter obtained byexploiting the fixed f_(o)(x). The resulting sequence may be processedby a deep neural network DNN 310 to receive the transmitted bits, e.g.,M_(L) layers may be utilized. The layers at the receiver side may be acombination of some prior-art layers such as softmax, ReLU, batchnorm,convolution layer, fully-connected layer, etc.

The overall impact of the transmitter 100, channel, and receiver 200 maybe repressed as an autoencoder as shown in FIG. 5 and the learnableparameters in each layer may be learned by using a BP algorithm. Thelearning may be achieved through an offline or an online learningmethod. A Polar-to-Cartesian layer 510 described herein may also be usedduring the offline/online training.

Golay Layer

The output of a Golay layer 505 may be a set of sequences associatedwith f_(r)(x) and f_(i)(x) (i.e., the sequences generated by listing thevalues of the function as x=(x₁, x₂, . . . x_(m)) ranges over its 2^(m)values where x=Σ_(j) ² ^(m) x_(j)2^(m−j)) given by

$\begin{matrix}{\mspace{20mu} {{f_{\text{?}}(x)} = {e_{0} + {e_{m}x_{n{(m)}}} + {\sum\limits_{l = 1}^{m - 1}{e_{l}\left( {x_{n{(l)}} + x_{\pi {({l \cdot 1})}}} \right)}}}}} & (39) \\{\mspace{20mu} {{{f_{i}(x)} = {k_{0} + {\sum\limits_{l = 1}^{m}{k_{l}x_{n{(l)}}}}}}{\text{?}\text{indicates text missing or illegible when filed}}}} & (40)\end{matrix}$

where the inputs of a Golay layer 505 may be e_(n)∈

, k_(n)

$\in \left\lbrack {0,\frac{2\pi}{\beta}} \right)$

for n=0, 1, . . . , m. In one embodiment, the parameter e₀ in Golaylayer 505 may be chosen as a function of e_(n)∈

for n=1, 2, . . . , m as

$\begin{matrix}{e_{0} = {\frac{1}{2\alpha}{\sum\limits_{n = 1}^{m}{\ln\left( \frac{1 + e^{2\alpha \; e_{n}}}{2} \right)}}}} & (41)\end{matrix}$

The derivative f_(r)(x) with respect to e_(n) can be obtained as

$\begin{matrix}{\mspace{79mu} {\frac{{df}_{r}(x)}{{de}_{n}} = \left\{ {\begin{matrix}{{\left( {x_{\pi_{n}} + x_{\pi_{n + 1}}} \right) - \frac{e^{2\alpha \; e_{n}}}{1 + e^{2\alpha \; e_{n}}}},} & {n < m} \\{x_{\pi_{m}} - \frac{e^{2\alpha \; e_{n}}}{1\text{?}e^{2\alpha \; e_{n}}}} & {n \geq m}\end{matrix}\text{?}\text{indicates text missing or illegible when filed}} \right.}} & (42)\end{matrix}$

since

$\frac{{de}_{0}}{{de}_{n}} = {- {\frac{e^{2\alpha \; e_{n}}}{1 + e^{2\alpha \; e_{n}}}.}}$

Similarly, the derivative f_(i)(x) with respect to k_(n) and k₀ can becalculated as

$\begin{matrix}{{\frac{{df}_{i}(x)}{{dk}_{n}} = x_{\pi_{n}}}{and}} & (43) \\{\frac{{df}_{i}(x)}{{dk}_{0}} = 1.} & (44)\end{matrix}$

The derivative of the loss with respect to the e_(k) ^((n)) can becalculated as

$\begin{matrix}{\mspace{79mu} {{\frac{\partial J}{\partial e_{k}^{(n)}} = {{\sum\limits_{x = 1}^{2^{m}}{\frac{\delta \; {f_{r}^{(n)}(x)}}{\delta \; e_{k}^{(n)}}\frac{\delta \; J}{\delta \; {f_{r}^{(n)}(x)}}}} = {\sum\limits_{i = 1}^{2^{m}}{A_{\text{?}}\frac{\delta \; J}{\delta {f_{r}^{(n)}(x)}}}}}}{\text{?}\text{indicates text missing or illegible when filed}}}} & (45)\end{matrix}$

The derivative of the loss with respect to the k_(k) ^((n)) can becalculated as

$\begin{matrix}{\frac{\partial J}{\partial k_{k}^{(n)}} = {\sum\limits_{x = 1}^{2^{m}}{\frac{\delta \; {f_{i}^{(n)}(x)}}{\delta \; k_{k}^{(n)}}\frac{\delta \; J}{\delta \; {f_{i}^{(n)}(x)}}}}} & (46)\end{matrix}$

The derivative of the loss with respect to the k′^((n)) can becalculated as

$\begin{matrix}{\frac{\partial J}{\partial k_{0}^{(n)}} = {{\sum\limits_{x = 1}^{2^{m}}{\frac{\delta \; {f_{i}^{(n)}(x)}}{\delta \; k^{\prime {(n)}}}\frac{\delta \; J}{\delta \; {f_{i}^{(n)}(x)}}}} = {\sum\limits_{i = 1}^{2^{m}}\frac{\delta \; J}{\delta {f_{r}^{(n)}(x)}}}}} & (47)\end{matrix}$

The derivative of the loss with respect to the d_(k) ^((n)) can becalculated as

$\begin{matrix}{\frac{\partial J}{\partial d_{k}^{(n)}} = {\sum\limits_{x = 1}^{2^{m}}{\frac{\delta \; {f_{i}^{(n)}(x)}}{\delta \; d_{k}^{(n)}}\frac{\delta \; J}{\delta \; {f_{i}^{(n)}(x)}}}}} & (48)\end{matrix}$

The derivative of the loss with respect to the d₀ ^((n)) can becalculated as

$\begin{matrix}{\mspace{79mu} {{\frac{\partial J}{\partial d_{0}^{(n)}} = {{\sum\limits_{x = 1}^{2^{m}}{\frac{\delta \; {f_{i}^{(n)}(x)}}{\delta \; d_{0}^{(n)}}\frac{\delta \; J}{\delta \; {f_{i}^{(n)}(x)}}}} = {\sum\limits_{i = 1}^{2^{m}}\frac{\delta \; J}{\delta \; {f_{r}^{(n)}(x)}}}}}\mspace{20mu} {{Therefore},}}} & (49) \\{\mspace{79mu} {\frac{\partial J}{\partial e} = {\Lambda^{T}\frac{\partial J}{\partial Y}}}} & (50) \\{\mspace{79mu} {\frac{\partial J}{\partial k} = {\Omega^{T}\frac{\partial J}{\partial Y}}}} & (51) \\{\mspace{79mu} {{\frac{\partial J}{\partial d} = {S^{T}\frac{\partial J}{\partial Y}}}\mspace{20mu} {where}}} & (52) \\{\Lambda = {\begin{bmatrix}\left( x_{\pi {(1)}} \middle| x_{\pi {(2)}} \right)_{2} & \left( x_{\pi {(2)}} \middle| x_{\pi {(3)}} \right)_{2} & \ldots & \left( x_{\pi {({m - 1})}} \middle| x_{\pi {(m)}} \right)_{2} & x_{\pi {(m)}}\end{bmatrix} - \frac{e^{2{ae}_{n}}}{\left. 1 \middle| \text{?}^{2{ae}_{n}} \right.}}} & (53) \\{\mspace{79mu} {{\Omega = {S = \begin{bmatrix}1 & x_{\pi {(1)}} & x_{\pi {(2)}} & \ldots & x_{\pi {(m)}}\end{bmatrix}}}\mspace{20mu} {where}\mspace{20mu} {{e = \begin{bmatrix}e_{1} & e_{2} & \ldots & e_{m}\end{bmatrix}},{k = \begin{bmatrix}k_{0} & k_{1} & \ldots & k_{m}\end{bmatrix}},{and}}\mspace{20mu} {d = {{\begin{bmatrix}d_{0} & d_{1} & \ldots & d_{m}\end{bmatrix}.\text{?}}\text{indicates text missing or illegible when filed}}}}} & (54)\end{matrix}$

Clipping Layer

The clipping layer 500 can be expressed as

y=f(x)=min{max{x, r ₁ }, r ₂}  (55)

The derivative of the loss with respect to the weight x can becalculated as

$\begin{matrix}{\frac{\partial J}{\partial x} = \left\{ \begin{matrix}0 & {x \leq r_{2}} \\\frac{\delta \; J}{\delta \; y} & {r_{1} \leq x \leq r_{2}} \\0 & {r_{1} < x}\end{matrix} \right.} & (56)\end{matrix}$

Polar-to-Cartesian Layer

y _(r) =f(x, y)=

{e ^(αy) e ^(jβx) }=e ^(αy) cos(βx)  (57)

y _(i) =g(x, y)=ℑ{e ^(αy) e ^(jβx) }=e ^(αy) sin(βx)  (58)

The derivatives of the loss with respect to x and y can be calculated as

$\begin{matrix}{\frac{\partial J}{\partial x} = {{{\frac{\partial y_{r}}{\partial x}\frac{\partial J}{\partial y_{r}}} + {\frac{\partial y_{i}}{\partial x}\frac{\partial J}{\partial y_{i}}}} = {\begin{bmatrix}{{- \beta}\; e^{ay}{\sin \left( {\beta \; x} \right)}} & {\beta \; e^{ay}{\cos \left( {\beta \; x} \right)}}\end{bmatrix}\begin{bmatrix}\frac{\partial J}{\partial y_{r}} \\\frac{\partial J}{\partial y_{i}}\end{bmatrix}}}} & (59) \\{\frac{\partial J}{\partial y} = {{{\frac{\partial y_{r}}{\partial y}\frac{\partial J}{\partial y_{r}}} + {\frac{\partial y_{i}}{\partial y}\frac{\partial J}{\partial y_{i}}}} = {\begin{bmatrix}{\alpha \; e^{ay}{\cos \left( {\beta \; x} \right)}} & {\alpha \; e^{ay}{\sin \left( {\beta \; x} \right)}}\end{bmatrix}\begin{bmatrix}\frac{\partial J}{\partial y_{r}} \\\frac{\partial J}{\partial y_{i}}\end{bmatrix}}}} & (60)\end{matrix}$

EXAMPLE

Assume that 9 information bits need to be transmitted. In this case, thecommunication system may need to generate M=2⁹ codewords. Let m=5, H=4,α=1, and

$\beta = {\frac{2\pi}{H} = \frac{\pi}{2}}$

and the permutation π=(1, 2, 3, 4, 5). Therefore, based on Equations(39), (40) and (41), the Golay encoder may be expressed as

$\begin{matrix}{\mspace{79mu} {{f_{r}(x)} = {e_{\text{?}} + {e_{\text{?}}x_{\text{?}}} + {\sum\limits_{l = 1}^{4}{e_{l}\left( {x_{l} + x_{\text{?}}} \right)}}}}} & (61) \\{\mspace{79mu} {{{f_{i}(x)} = {k_{D} + {\sum\limits_{i = 1}^{5}{k_{l}x_{l}}}}}{\text{?}\text{indicates text missing or illegible when filed}}}} & (62)\end{matrix}$

where the inputs of a Golay layer 505 may be e_(n)∈

, k_(n)∈[0,4) for n=0, 1, . . . , 5 where

$\begin{matrix}{\mspace{20mu} {e_{D} = {\frac{1}{2}{\sum\limits_{\text{?} = 1}^{5}{{{\ln \left( \frac{1 + e^{2e_{n}}}{2} \right)}.\text{?}}\text{indicates text missing or illegible when filed}}}}}} & (63)\end{matrix}$

The OFDM waveform may be expressed as

$\begin{matrix}{\mspace{79mu} {{{p_{c}(t)} = {\sum\limits_{x = 0}^{2^{5} - 1}{e^{\text{?}\pi \; {f_{sign}{(x)}}} \times e^{{\text{?}{f_{r}{(x)}}} + {\text{?}\beta \; {f_{\text{?}}{(x)}}}} \times e^{\frac{j\; 2\pi \; t}{T}{({xN})}}}}}{\text{?}\text{indicates text missing or illegible when filed}}}} & (64)\end{matrix}$

where f_(sign)(x)=Σ_(l=1) ^(m−1)x_(l)x_(l+1). In this example, learninglayers at the transmitter 100 control the Golay layer parameters as0.1≤e_(n)≤0.5 for n=1, 2, . . . , 5 and −2≤k_(n)≤2 for n=0, 1, . . . ,5. M=512 different possible messages may be mapped to the values ofe_(n) for n=1, 2, . . . , 5 and k_(n) for n=0, 1, . . . , 5. Theparameter e₀ is set to

$\frac{1}{2}{\sum_{n = 1}^{5}{\ln \left( \frac{1 + e^{2e_{n}}}{2} \right)}}$

to stabilize the mean power of the signal, which is critical forpower-limited AI-based transmission. At the receiver 200, an OFDMreceiver may be considered and the corresponding subcarriers areprocessed with several neural network layers. At the receiver 200, thelast layer may be a classification layer where its size is M=512 asthere 512 different codewords. The transmitter 100 and receiver 200 maybe considered as an autoencoder, and the expected behavior is identityoperation (i.e., receiving the transmitted message). In other words, ifthe first codeword (e.g. information bits are (0,0,0,0,0,0,0,0,0)) istransmitted at the transmitter 100, the first element of classificationlayer may be closer to 1 while other 511 elements are near 0 (i.e.,one-hot vector encoding form). if the second codeword is transmitted atthe transmitter 100 (e.g. information bits are (0,0,0,0,0,0,0,0,1)), thesecond element of classification layer may be closer to 1 while other511 elements are near 0.

An offline training with backpropagation is adopted. The design may beperformed under AWGN channel where the variance of the noise is set to2.5 dB in this example. The layer information at the transmitter 100 andreceiver 200 in this example are provided in see FIG. 9, Table 1. Thetraining batch size is set to 100, i.e., the gradients are calculatedover 100 different 9 information bits. After the layers at thetransmitter 100 and receiver 200 are trained, the learned parameters maybe used in an OFDM transmission as illustrated in FIG. 5.

In FIG. 6, we provide block error rate (BLER), bit error rate (BER), andspectral efficiency (SE) as compared to Shannon limit for the AI-basedlearning with Golay layer and Polar code under the same SE (i.e., 9 bitsover 32 subcarrier) for OFDM transmission. Polar code is optimized at 3dB SNR. AI-based learning with Golay layer is slightly better than thePolar code in this scenario in terms of BLER and BER while it offers amajor improvement for PAPR, i.e., more 7 dB PAPR gain. While Golay layerkeeps the PAPR less than or equal to 3 dB PAPR, Polar code causes largePAPR, i.e., 10.8 dB PAPR for 90^(th) percentile, while also cause largefluctuations on the mean power (13.8 dB for 90^(th) percentile). ThePAPR distributions of the signals are provided in FIG. 7. FIG. 6 showsblock error rate, bit error rate, and spectral efficiency of theAI-based learning with Golay layer. FIG. 7 shows a PAPR comparison.

In FIG. 8, we show the learned constellation per subcarriers while thepoints marked by a circle, plus, and cross indicate the elements ofthree different encoded sequences in the frequency domain for each OFDMsubcarrier. Although the constellation per subcarrier does not followtraditional constellations, e.g., M-QAM alphabet, the sequences arestill differentiable at the receiver side with good BER/BLERperformance. FIG. 8 shows the distribution of the elements of thesequence on different subcarriers (i.e., learned constellation persubcarrier).

Learning Amplitude and Phase Offsets for Higher Data Rates

In one method, the sets for phase and amplitude e_n, k_n for n=0, 1, . .. , m may be predetermined and their sets are offset by neural networksto be able to transmit large amount of bits. For example,e_n∈e_(fix,n)+Δe_n and k_n∈k_(fix,n)+Δk_n where e_(fix,n)∈S={−0.5, 0.8,1.2, 1.5} and k_(fix,n)∈Z_4={0, 1, 2, 3} and neural network is trainedto obtain {Δk_n} and {Δe_n} for n=0, 1, . . . , m. In another method,neural network may obtain multiple {Δk_n} and {Δe_n} for n=0, 1, . . . ,m.

At the receiver 200, a neural network DNN 310 may be utilized a longwith a traditional decoder (no learning capability). While the neuralnetwork 310 subtracts the impact of the offset {Δk_n} and {Δe_n} fromthe received signal, the traditional decoder may decode the remainingsignal.

The different parts of the methods disclosed in herein may be combined.They can be applied for the fields (e.g., computer vision, imageprocessing, etc.). For example, the Golay layer 505 may be utilized tolimit the distribution of the values for the inputs of the followinglayer.

The current disclosure has demonstrated the efficacy of the methodsdisclosed herein through computer analysis. The current disclosure maybe implemented in a baseband/RF chipset of a radio communication device.The method may also be prescribed in a wireless communication standardthat allows machine learning-based communications.

The disclosed methods can also be applied to communication devices thatoperate under power-limited link budgets while autonomously optimizingits error rate performance for auto-encoder OFDM. The disclosed methodmay decrease the training duration as the transmitter 100 and receiver200 do not deal with the PAPR problem with the Golay layer 505. Golaylayer 505 itself ensure the low PAPR.

While the present subject matter has been described in detail withrespect to specific exemplary embodiments and methods thereof, it willbe appreciated that those skilled in the art, upon attaining anunderstanding of the foregoing may readily produce alterations to,variations of, and equivalents to such embodiments. Accordingly, thescope of the present disclosure is by way of example rather than by wayof limitation, and the subject disclosure does not preclude inclusion ofsuch modifications, variations and/or additions to the present subjectmatter as would be readily apparent to one of ordinary skill in the artusing the teachings disclosed herein.

What is claimed is:
 1. A method for avoiding non-linear distortion inend-to-end learning communication systems, the communication systemcomprising a transmitter and a receiver, the method comprising: mappingtransmitted information bits to an input of a first neural network;controlling, by an output of the neural network, parameters of acomplementary sequence (CS) encoder, producing an encoded CS;transmitting the encoded CS through an orthogonal frequency divisionmultiplexing (OFDM) signal; processing, by Discrete Fourier Transform(DFT), the encoded CS in a frequency domain, to produce a receivedinformation signal; and processing, by a second neural network, thereceived information signal.
 2. The method of claim 1, wherein the CSencoder comprises an amplitude encoder, a phase encoder, and a shiftencoder.
 3. The method of claim 2, wherein mapping the transmittedinformation bits to an input of a first neural network furthercomprises: manually tuning the shift encoder to adjust a position ofnon-zero elements of the CS; tuning the amplitude encoder and the phaseencoder using the first neural network to produce tuned parameters; andmapping the information bits to the tuned parameters.
 4. The method ofclaim 1, wherein the encoded CS is processed by multiple layers at thetransmitter, the layers including at least a Golay layer, the methodfurther comprising: controlling only the amplitude encoder and the phaseencoder of the Golay layer; and forming an autoencoder which capturestransmitter, channel, and receiver behaviors.
 5. The method of claim 2,wherein, sets for the amplitude encoder and phase encoder arepredetermined and offset by the first neural network in order to be ableto transmit a large number of information bits.
 6. The method of claim5, wherein the receiver further comprises a decoder, the method furthercomprising: subtracting, by the second neural network, the offsets fromthe received information signal to produce a remaining informationsignal; and decoding, by the decoder, the remaining information signal.7. The method of claim 4, wherein the layers further include a clippinglayer configured to limit the amplitude of the information signal. 8.The method of claim 4, wherein the layers further include aPolar-to-Cartesian layer configured to convert the coordinate systemfrom Polar coordinates to a Cartesian coordinate system.
 9. Anend-to-end learning communication system for avoiding non-lineardistortion, the system comprising: a transmitter implemented byprocessing circuitry, the processing circuitry comprising a processorand a memory containing instructions executable by the processor, theprocessor of the transmitter configured to: map transmitted informationbits to an input of a first neural network; control, by an output of theneural network, parameters of a complementary sequence (CS) encoder,producing an encoded CS; and transmit the encoded CS through anorthogonal frequency division multiplexing (OFDM) signal; and a receiverimplemented by processing circuitry, the processing circuitry comprisinga processor and a memory containing instructions executable by theprocessor, the processor of the receiver configured to: process, byDiscrete Fourier Transform (DFT), the encoded CS in a frequency domain,to produce a received information signal; and process, by a secondneural network, the received information signal.
 10. The system of claim9, wherein the CS encoder comprises an amplitude encoder, a phaseencoder, and a shift encoder.
 11. The system of claim 10, whereinmapping the transmitted information bits to an input of a first neuralnetwork further comprises: manually tuning, by the processor of thetransmitter, the shift encoder to adjust a position of non-zero elementsof the CS; tuning, by the processor of the transmitter, the amplitudeencoder and the phase encoder using the first neural network to producetuned parameters; and mapping, by the processor of the transmitter, theinformation bits to the tuned parameters.
 12. The system of claim 10,wherein the encoded CS is processed by multiple layers at thetransmitter, the layers including at least a Golay layer, the processorof the transmitter further configured to: control only the amplitudeencoder and the phase encoder of the Golay layer; and form anautoencoder which captures transmitter, channel, and receiver behaviors.13. The system of claim 10, wherein sets for the amplitude encoder andphase encoder are predetermined and offset by the first neural networkin order to be able to transmit a large number of information bits. 14.The system of claim 13, wherein the processor of the receiver is furtherconfigured to: subtract, by the second neural network, the offsets fromthe received information signal to produce a remaining informationsignal; and decode, by the decoder, the remaining information signal.15. The system of claim 12, wherein the layers further include aclipping layer configured to limit the amplitude of the informationsignal.
 16. The system of claim 12, wherein the layers further include aPolar-to-Cartesian layer configured to convert the coordinate systemfrom Polar coordinates to a Cartesian coordinate system.